{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##Bandit Agorithms 101\n",
    "2015 August 28\n",
    "\n",
    "* See imports below for requirements\n",
    "* In the same directory you cloned Data-Science-45min-Intros, you should also clone the BanditBook repository from https://github.com/johnmyleswhite/BanditsBook"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##Agenda\n",
    "1. A simulation approach to building intuition about A/B testing\n",
    "2. Practicalities of testing and operation\n",
    "3. Optimizing outcomes with multiple options and different payoffs per option\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##1. A simulation approach to building intuition about A/B testing\n",
    "\n",
    "This section isn't a cookbook for A/B testing. Rather, I am pointing to some key aspecst of how we design and analyse A/B tests that will be useful when we get to the section on bandit algorithms.\n",
    "\n",
    "###When thinking about A/B tests, let's focus on 3 key components:\n",
    "* Treatment (e.g. content presented, vaccination, etc.)\n",
    "* Reward (e.g. impression, click, proceeds of sale, immunity)\n",
    "* Audience and context (The context is an arbitrary but specifica set of circumstances under which the audience experiences the treatment.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from IPython.display import Image \n",
    "Image(filename='img/treat_aud_reward.jpg')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import matplotlib\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import seaborn as sns\n",
    "import pandas as pd\n",
    "from numpy.random import binomial\n",
    "from ggplot import *\n",
    "import random\n",
    "import sys\n",
    "plt.figure(figsize=(6,6),dpi=80);\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###Simulating A/B testing to build intuition\n",
    "\n",
    "Situation:\n",
    "* 2 treatments\n",
    "* 1 uniform audience, split into to random groups\n",
    "* reward is merely the outcome\n",
    "\n",
    "Simulation:\n",
    "* discrete outcomes\n",
    "* 50/50 chances of outcome\n",
    "* Many trials"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "Image(filename='img/ab.jpg')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each test is like flipping a fair coin N times"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "Image(filename='img/a.jpg')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# This is A/ testing!\n",
    "# This is the result of 1 arm, 100 trials\n",
    "df = pd.DataFrame({\"coin_toss\":binomial(1,0.5,100)})\n",
    "df.hist()\n",
    "# Everyone got the same treatment, this is the distribution of the outcome\n",
    "# reward is the total height of the right-hand bar\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the cell above a few times.\n",
    "\n",
    "Can you easily make the case that the average reward is 50?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# every sample is 0/1, heads or tails\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# now with a high probability of heads\n",
    "df = pd.DataFrame({\"coin_toss\":binomial(1,0.6,100)})\n",
    "df.hist()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Compare the variability across many different experiments\n",
    "# of 100 flips each (variability of the mean)\n",
    "df = pd.DataFrame({\"coin_%i\"%i:binomial(1,0.5,100) for i in range(20)})\n",
    "df.hist()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Can we distinguish a small differce in probability?\n",
    "df = pd.DataFrame({\"coin_%i\"%i:binomial(1,0.52,100) for i in range(20)})\n",
    "df.hist()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###Rewards of the actions might be different\n",
    "\n",
    "What if reward for choice 0 = -0.1 and choice 1 = 0.5?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# 1 arm\n",
    "payoff = [-0.1,0.5]\n",
    "a = np.bincount(binomial(1,0.5,100))\n",
    "print \"Number of 0s and 1s:\", a\n",
    "print \"Total reward with pay off specified =\", np.dot(a, payoff)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###Add treatment B to make it more instesting\n",
    "\n",
    "Now we want to compare the rewards for an individuals in the audience choosing A or B. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# 2-arm, equal unity reward per coin\n",
    "# (4 outcomes but 1,0=0,1 with this payoff vector)\n",
    "payoff = [0,1,2]\n",
    "a = np.bincount(binomial(2,0.5,100))\n",
    "print a\n",
    "print np.dot(a, payoff)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###Simulating multiple arms with different payoffs\n",
    "\n",
    "But more often, \n",
    "* Arm A (outcome 1/0) has a reward (e.g. 1.0 below) and \n",
    "* Arm B (outcome 1/0) has a different reward (eg.g 1.05 below)\n",
    "\n",
    "For example, imagine the case of tweet engagement.\n",
    "\n",
    "* a user can click on a URL in a tweet (outcome 1/0)\n",
    "* a user can retweet (outcome 1/0)\n",
    "* rewards for a site visit might be 10x those of a retweet\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "payoff1=[0,1]\n",
    "reward1 = np.dot(np.bincount(binomial(1,0.5,100)), payoff1)\n",
    "print \"Arm A reward = \", reward1\n",
    "payoff2=[0,1.05]\n",
    "reward2 = np.dot(np.bincount(binomial(1,0.5,100)), payoff2)\n",
    "print \"Arm B reward = \", reward2\n",
    "total_reward = reward1 + reward2\n",
    "print \"Total reward for arms A and B = \", total_reward"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Why worry about the total reward? I thought we wanted to know if A > B?\n",
    "\n",
    "Hold that thought for a couple of more cells...\n",
    "\n",
    "From now on, assume reward = 0 for outcome 0 to keep thing a little simpler. Everything we do can be generalized to a reward vector as above.\n",
    "\n",
    "###How easy is it to differentiat two arms with different payoffs?\n",
    "\n",
    "Lets ask about choosing a winner."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def a_b_test(one_payoff=[1, 1.01]):\n",
    "    # assume payoff for outcome 0 is 0\n",
    "    reward1 = np.bincount(binomial(1,0.5,100))[1] * one_payoff[0]\n",
    "    reward2 = np.bincount(binomial(1,0.5,100))[1] * one_payoff[1]\n",
    "    return reward1, reward2, reward1 + reward2, reward1-reward2\n",
    "\n",
    "n_tests = 1000\n",
    "sim = np.array([a_b_test() for i in range(n_tests)])\n",
    "df = pd.DataFrame(sim, columns=[\"t1\", \"t2\", \"tot\", \"diff\"])\n",
    "print \"Number of tests in which Arm B won (expect > {} because of payoff) = {}\".format(\n",
    "    n_tests/2\n",
    "    , len(df[df[\"diff\"] <= 0.0]))\n",
    "df.hist()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now is them it jump to your power and significance testing expertise.\n",
    "\n",
    "We are going to continue building intution through simulation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Can we differentiat two arms with different probabilities?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def a_b_test(ps=[0.5, 0.51], one_payoff=[1, 1]):\n",
    "    reward1 = np.bincount(binomial(1,ps[0],100))[1] * one_payoff[0]\n",
    "    reward2 = np.bincount(binomial(1,ps[1],100))[1] * one_payoff[1]\n",
    "    return reward1, reward2, reward1 + reward2, reward1-reward2\n",
    "n_tests= 100\n",
    "sim = np.array([a_b_test() for i in range(n_tests)])\n",
    "df = pd.DataFrame(sim, columns=[\"t1\", \"t2\", \"tot\", \"diff\"])\n",
    "print \"Number of tests in which Arm B won (expect > {} because of probability) = {}\".format(\n",
    "    n_tests/2\n",
    "    , len(df[df[\"diff\"] <= 0.0]))\n",
    "df.hist()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###More Arms"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "Image(filename='img/abcd.jpg')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# repeating what did before with equal equal payoff, more arms\n",
    "# remember the degenerate outcomes\n",
    "df = pd.DataFrame({\"tot_reward\":binomial(2,0.5,100)})\n",
    "df.hist()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# ok, now with 4\n",
    "df = pd.DataFrame({\"tot_reward\":binomial(4,0.5,100)})\n",
    "df.hist()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Quick aside\n",
    "\n",
    "Easy to simulate many arms with different probabilities when we have 0/1 reward.\n",
    "\n",
    "\n",
    "Maybe interesting to compare to uniform probability case?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# a little more practice with total reward distribution\n",
    "trials = 100\n",
    "probabilities = [0.1, 0.1, 0.9]\n",
    "reward = np.zeros(trials)\n",
    "for m in probabilities:\n",
    "    # equal rewards of 1 or 0\n",
    "    reward += binomial(1,m,trials)\n",
    "df = pd.DataFrame({\"reward\":reward, \"fair__uniform_reward\":binomial(3,0.5,trials)})\n",
    "df.hist()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##2. Practicalities of testing and operation\n",
    "\n",
    "###Explore vs Exploit?\n",
    "\n",
    "* Marketer is going to have a hard time waiting for rigorous testing as winners appear to emerge\n",
    "* Implement online system vs. testing system\n",
    "* Chase successes vs. analyzing failures\n",
    "\n",
    "So maybe set some new objectives instead of is A>B:\n",
    "* Want to automate balance of explore and exploit and run continuously\n",
    "* Implement one system\n",
    "\n",
    "Carefull:\n",
    "* Some danger in forgetting about significance and power"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###Bandit Book Utilities \n",
    "from \"Bandit Algorithms\" by John Myles White\n",
    "\n",
    "Three utilities:\n",
    "* arms\n",
    "* algorithms\n",
    "* monte carlo testing framework"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "sys.path.append('../../BanditsBook/python')\n",
    "from core import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "##3. Optimizing outcomes with multiple options with different payoffs\n",
    "\n",
    "How can we explore arms and exploit the best arm more often, but still explore?\n",
    "\n",
    "Answer 1: occasionally, we randomly explore losers.\n",
    "\n",
    "###Epsilon Greedy\n",
    "\n",
    "1. Startup by visiting all the arms\n",
    "2. Keep track of currrent probabilities for each arm based on visits so far\n",
    "3. After startup, split experiment and exploit for each new individual from the audience with probability epsilon\n",
    "\n",
    "Notes:\n",
    "* epsilon is the fraction of exploration\n",
    "* randomize everything all the time\n",
    "\n",
    "What's a good value for $\\epsilon$?\n",
    "\n",
    "* Depends on number of arms\n",
    "* Depends on individuals per time and total time\n",
    "\n",
    "What are these parameters when one of the options is 9 times better than all of the others?\n",
    "\n",
    "Needs a simulation!\n",
    "\n",
    "(To keep it simple outcome 1/0 has reward 1/0.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "random.seed(1)\n",
    "# Mean (arm probabilities) (Bernoulli)\n",
    "means = [0.1, 0.1, 0.1, 0.1, 0.9]\n",
    "# Mulitple arms!\n",
    "n_arms = len(means)\n",
    "random.shuffle(means)\n",
    "arms = map(lambda (mu): BernoulliArm(mu), means)\n",
    "print(\"Best arm is \" + str(ind_max(means)))\n",
    "\n",
    "t_horizon = 250\n",
    "n_sims = 1000\n",
    "\n",
    "data = []\n",
    "for epsilon in [0.1, 0.2, 0.3, 0.4, 0.5]:\n",
    "    algo = EpsilonGreedy(epsilon, [], [])\n",
    "    algo.initialize(n_arms)\n",
    "    # results are column oriented\n",
    "    # simulation_num, time, chosen arm, reward, cumulative reward\n",
    "    results = test_algorithm(algo, arms, n_sims, t_horizon)\n",
    "    results.append([epsilon]*len(results[0]))\n",
    "    data.extend(np.array(results).T)\n",
    "    \n",
    "df = pd.DataFrame(data\n",
    "            , columns = [\"Sim\"\n",
    "                         , \"T\"\n",
    "                         , \"ChosenArm\"\n",
    "                         , \"Reward\"\n",
    "                         , \"CumulativeReward\"\n",
    "                         , \"Epsilon\"])\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "a=df.groupby([\"Epsilon\", \"T\"]).mean().reset_index()\n",
    "a.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "ggplot(aes(x=\"T\",y=\"Reward\", color=\"Epsilon\"), data=a) + geom_line()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "ggplot(aes(x=\"T\",y=\"CumulativeReward\", color=\"Epsilon\"), data=a) + geom_line()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###Anealing Softmax\n",
    "\n",
    "Upgrades to $\\epsilon$-Greedy:\n",
    "* Need to run more experiments if rewards appear to be nearly equal\n",
    "* Keep track of results for exploration as well as exploitation\n",
    "\n",
    "Tempted choose each are in proportion to its current value, i.e.:\n",
    "\n",
    "$p(A) \\propto \\frac{rA}{rA + RB}$\n",
    "\n",
    "$p(B) \\propto \\frac{rB}{rA + RB}$\n",
    "\n",
    "Remember Boltzmann, and about adding a temperature, $\\tau$:\n",
    "\n",
    "$p(A) \\propto \\frac{-\\exp(rA/\\tau)}{(\\exp(rA/\\tau) + \\exp(rB/\\tau))}$\n",
    "\n",
    "$p(B) \\propto \\frac{-\\exp(rB/\\tau)}{(\\exp(rA/\\tau) + \\exp(rB/\\tau))}$\n",
    "\n",
    "And what is annealing?\n",
    "\n",
    "* $\\tau = 0$ is deterministic case of winner takes all\n",
    "* $\\tau = \\infty$ is all random, all time\n",
    "* Let the temperature go to zero over time to settle into the state slowly (adiabatically)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "t_horizon = 250\n",
    "n_sims = 1000\n",
    "\n",
    "algo = AnnealingSoftmax([], [])\n",
    "algo.initialize(n_arms)\n",
    "data = np.array(test_algorithm(algo, arms, n_sims, t_horizon)).T\n",
    "df = pd.DataFrame(data)\n",
    "#df.head()\n",
    "df.columns = [\"Sim\", \"T\", \"ChosenArm\", \"Reward\", \"CumulativeReward\"]\n",
    "df.head()\n",
    "a=df.groupby([\"T\"]).mean().reset_index()\n",
    "a.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "ggplot(aes(x=\"T\",y=\"Reward\", color=\"Sim\"), data=a) + geom_line()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "ggplot(aes(x=\"T\",y=\"CumulativeReward\", color=\"Sim\"), data=a) + geom_line()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###UCB2\n",
    "\n",
    "Add a confidnece measure to our estimates of averages!\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "t_horizon = 250\n",
    "n_sims = 1000\n",
    "\n",
    "data = []\n",
    "for alpha in [0.1, 0.3, 0.5, 0.7, 0.9]:\n",
    "    algo = UCB2(alpha, [], [])\n",
    "    algo.initialize(n_arms)\n",
    "    results = test_algorithm(algo, arms, n_sims, t_horizon)\n",
    "    results.append([alpha]*len(results[0]))\n",
    "    data.extend(np.array(results).T)\n",
    "    \n",
    "df = pd.DataFrame(data, columns = [\"Sim\", \"T\", \"ChosenArm\", \"Reward\", \"CumulativeReward\", \"Alpha\"])\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "a=df.groupby([\"Alpha\", \"T\"]).mean().reset_index()\n",
    "a.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "ggplot(aes(x=\"T\",y=\"Reward\", color=\"Alpha\"), data=a) + geom_line()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "ggplot(aes(x=\"T\",y=\"CumulativeReward\", color=\"Alpha\"), data=a) + geom_line()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "The end"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
